Active learning based data selection for limited resource STT and KWS

نویسندگان

Thiago Fraga-Silva

Jean-Luc Gauvain

Lori Lamel

Antoine Laurent

Viet Bac Le

Abdelkhalek Messaoudi

چکیده

This paper presents first results in using active learning (AL) for training data selection in the context of the IARPABabel program. Given an initial training data set, we aim to automatically select additional data (from an untranscribed pool data set) for manual transcription. Initial and selected data are then used to build acoustic and language models for speech recognition. The goal of the AL task is to outperform a baseline system built using a pre-defined data selection with the same amount of data, the Very Limited Language Pack (VLLP) condition. AL methods based on different selection criteria have been explored. Compared to the VLLP baseline, improvements are obtained in terms of Word Error Rate and Actual Term Weighted Values for the Lithuanian language. A description of methods and an analysis of the results are given. The AL selection also outperforms the VLLP baseline for other IARPABabel languages, and will be further tested in the upcoming NIST OpenKWS 2015 evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing STT and KWS systems using limited language resources

This paper presents recent progress in developing speech-totext (STT) and keyword spotting (KWS) systems for the 2014 IARPA-Babel evaluation. Systems have been developed for the limited language pack condition for four of the five development languages in this program phase: Assamese, Bengali, Haitian Creole and Zulu. The systems have several novel characteristics that support rapid development...

متن کامل

Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages

In recent years there has been significant interest in Automatic Speech Recognition (ASR) and Key Word Spotting (KWS) systems for low resource languages. One of the driving forces for this research direction is the IARPA Babel project. This paper examines the performance gains that can be obtained by combining two forms of deep neural network ASR systems, Tandem and Hybrid, for both ASR and KWS...

متن کامل

Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search

This paper proposes an approach to rapidly update a multilingual deep neural network (DNN) acoustic model for low-resource keyword search (KWS). We use submodular data selection to select a small amount of multilingual data which covers diverse acoustic conditions and is acoustically close to a low-resource target language. The selected multilingual data together with a small amount of the targ...

متن کامل

Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED

Recently there has been increased interest in Automatic Speech Recognition (ASR) and Key Word Spotting (KWS) systems for low resource languages. One of the driving forces for this research direction is the IARPA Babel project. This paper describes some of the research funded by this project at Cambridge University, as part of the Lorelei team co-ordinated by IBM. A range of topics are discussed...

متن کامل

A Model for Project Selecting with Limited Resources in Data Envelopment Analysis with Input and Output Fuzzy

In Evaluating Performance, Selecting a Subset from a Set of Solutions with Limited Resources is Essential. If There Is More Than One Input and Output, the Data Rnvelopment Analysis Optimization Models Are Evaluated and Performance Measurement Based on the Weighted Output Is Divided Weighted Input. In This Research, Two Models of Optimization with Limited Resources Present from Data Envelopment ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Active learning based data selection for limited resource STT and KWS

نویسندگان

چکیده

منابع مشابه

Developing STT and KWS systems using limited language resources

Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages

Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search

Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED

A Model for Project Selecting with Limited Resources in Data Envelopment Analysis with Input and Output Fuzzy

عنوان ژورنال:

اشتراک گذاری